{
 "cells": [
  {
      "cell_type": "markdown",
      "metadata": {
        "id": "EROAwrOzmR_P",
        "colab_type": "text"
      },
      "source": [
        "![人工智慧 - 自由團隊](https://raw.githubusercontent.com/chenkenanalytic/img/master/af/aifreeteam.png)\n",
        "\n",
        "\n",
        "<center>Welcome to the data deployment practice of Traditional Chinese Handwriting Dataset by AI . FREE Team.</center>\n",
        "<br>\n",
        "<center>歡迎大家來到 AI . FREE Team 所開發的繁體中文手寫資料部署實作。 </center>\n",
        "<br>\n",
        "\n",
        "<center>(Author: Chen Ken, Yen-Lin 博士；Date of published: 2020/4/21；AI . FREE Team Website: https://aifreeblog.herokuapp.com/)</center>"
      ]
    },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 資料部署步驟說明       \n",
    "\n",
    "* 部署前準備:   \n",
    "  下載本專案且解壓縮 data 資料夾內的四個壓檔檔案至 D 槽  \n",
    "<p> \n",
    "* 部署資料:  \n",
    "  Step 1: 建立存放繁體中文手寫資料的資料夾( OutputPath )   \n",
    "  Step 2: 解壓縮壓縮檔，且將全部的資料移至 OutputPath    \n",
    "  Step 3: 依繁體中文單字建立子資料夾分類資料"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "OutputFolder = 'Traditional_Chinese_Data'\n",
    "DrivePath = 'D:/'\n",
    "OutputPath = DrivePath + OutputFolder"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Step 1"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Create the new \"Traditional_Chinese_Data\" folder in current working directory.\n",
      "Current Working Directory: D:\\\n"
     ]
    }
   ],
   "source": [
    "import os\n",
    "import zipfile\n",
    "import shutil\n",
    "\n",
    "if not os.path.exists( OutputPath ):\n",
    "    os.chdir( DrivePath ) # Change the current working directory to DrivePath.\n",
    "    os.mkdir( OutputFolder ) # Create the new directory in DrivePath.\n",
    "    print( f'Create the new \"{OutputFolder}\" folder in current working directory.' )\n",
    "else: os.chdir( DrivePath )\n",
    "\n",
    "print( f'Current Working Directory: {os.getcwd()}' )"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Step 2 - Step 3"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Decompress successfully D:\\cleaned_data(50_50)-20200420T071507Z-001.zip ......\n",
      "Decompress successfully D:\\cleaned_data(50_50)-20200420T071507Z-002.zip ......\n",
      "Decompress successfully D:\\cleaned_data(50_50)-20200420T071507Z-003.zip ......\n",
      "Decompress successfully D:\\cleaned_data(50_50)-20200420T071507Z-004.zip ......\n",
      "Moving images according to traditional Chinese characters......\n",
      "Data Deployment completed.\n"
     ]
    }
   ],
   "source": [
    "CompressedFiles = []\n",
    "for item in os.listdir():  \n",
    "    if item.endswith( '.zip' ): # Check for \".zip\" extension.\n",
    "        file_path = os.path.abspath( item ) # Get full path of the compressed file. \n",
    "        CompressedFiles.append( file_path )\n",
    "\n",
    "for file in CompressedFiles:     \n",
    "    # Construct a ZipFile object with the filename, and then extract it.\n",
    "    zip_ref = zipfile.ZipFile( file ).extractall( OutputPath ) \n",
    "    \n",
    "    source_path = OutputPath + '/cleaned_data(50_50)'\n",
    "    img_list = os.listdir( source_path )\n",
    "\n",
    "    for img in img_list:\n",
    "        shutil.move( source_path + '/' + img, OutputPath ) # Move a file to another location. \n",
    "    \n",
    "    shutil.rmtree( OutputPath + '/cleaned_data(50_50)' ) \n",
    "    print( f'Decompress successfully {file} ......' )\n",
    "\n",
    "print( 'Moving images according to traditional Chinese characters......' )\n",
    "ImageList = os.listdir( OutputPath  )\n",
    "ImageList = [ img for img in ImageList if len(img)>1 ]\n",
    "WordList = list( set([ w.split('_')[0] for w in ImageList ]) )\n",
    "\n",
    "for w in WordList:\n",
    "    try:\n",
    "        os.chdir( OutputPath ) # Change the current working directory to OutputPath.\n",
    "        os.mkdir( w ) # Create the new word folder in OutputPath.\n",
    "        MoveList = [ img for img in ImageList if w in img ]\n",
    "                  \n",
    "    except: \n",
    "        os.chdir( OutputPath )\n",
    "        MoveList = [ img for img in ImageList if w in img ]\n",
    "    \n",
    "    finally:            \n",
    "        for img in MoveList:\n",
    "            old_path = OutputPath + '/' + img\n",
    "            new_path = OutputPath + '/' + w + '/' + img\n",
    "            shutil.move( old_path, new_path )\n",
    "\n",
    "print( 'Data Deployment completed.' )"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "There are 250712 images (4803 charaters).\n"
     ]
    }
   ],
   "source": [
    "WordCount = 0\n",
    "SampleCount = 0\n",
    "\n",
    "for item in os.listdir( OutputPath ):\n",
    "    WordCount += 1\n",
    "    for i in os.listdir( OutputPath + '/' + item ):\n",
    "        SampleCount +=1\n",
    "\n",
    "print( f'There are {SampleCount} images ({WordCount} charaters).' )"
   ]
  }
 ],
 "metadata": {
  "colab": {
   "collapsed_sections": [],
   "name": "Traditional_Chinese_Handwriting_Dataset.ipynb",
   "provenance": []
  },
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}
